Skip to content

fix(provider/openai-compatible): trust user-declared modalities, dont inject fake ERROR text#83

Merged
Alezander9 merged 1 commit into
mainfrom
alex/openai-compat-trust-user-modalities
May 18, 2026
Merged

fix(provider/openai-compatible): trust user-declared modalities, dont inject fake ERROR text#83
Alezander9 merged 1 commit into
mainfrom
alex/openai-compat-trust-user-modalities

Conversation

@Alezander9
Copy link
Copy Markdown
Member

@Alezander9 Alezander9 commented May 18, 2026

Summary

packages/opencode/src/provider/transform.tsunsupportedParts replaces image/file parts with an inline ERROR: Cannot read <name> (this model does not support <modality> input) text part when model.capabilities.input.<modality> is false. This is the right behavior for native providers where models.dev knows the truth. For @ai-sdk/openai-compatible (the user-configured proxy provider) it's a silent footgun: there is no models.dev entry to consult, so image/audio/video/pdf all default to false unless the user explicitly declares modalities in opencode.json, and bcode silently strips screenshots from vision-capable upstream models.

Repro

Browser Use cloud's V4 LLM gateway is an openai-compatible proxy in front of Anthropic. Worker opencode.json declared:

"models": { "claude-opus-4.7": { "name": "claude-opus-4.7 via V4 gateway" } }

Vision smoketest: agent screenshots example.com via browser_execute, writes its verdict to a file. Result every run: "I CANNOT SEE THE IMAGE". Gateway-side request audit confirmed zero image_url / file / data:image/ signals in the wire body even when the synthetic "Attached media from tool result:" user message was hoisted with file parts. The file part was being replaced by unsupportedParts before streamText serialized the body — the model saw the fabricated error text as if it came from the user.

Capability derivation reference (provider.ts:1298-1304):

image: model.modalities?.input?.includes("image") ?? existingModel?.capabilities.input.image ?? false,

When neither the user-declared model nor a models.dev fallback declares modalities, image defaults to false. unsupportedParts then replaces the file part with:

{
  type: "text",
  text: `ERROR: Cannot read "${filename}" (this model does not support image input). Inform the user.`,
}

Fix

Early-return for @ai-sdk/openai-compatible — forward image/file parts as-is. If upstream truly can't handle them, the provider call returns a real error (e.g. Anthropic 400 "unsupported media type") which is far more debuggable than fabricated capability text the model reads back to the user.

if (model.api.npm === "@ai-sdk/openai-compatible") return msgs

Native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) keep the existing check — models.dev IS authoritative for them, and the filter prevents a real 4xx upstream while giving the user a clear local error.

Why not require users to declare modalities?

That's the workaround on our side (declaring modalities: { input: ['text', 'image'] } in opencode.json), which we're also shipping. But:

  1. The current behavior is a silent footgun — there's no warning, the model just receives weird text.
  2. The error text masquerades as user input, which trains the model to "explain why it can't see the image" instead of surfacing the bug.
  3. For user-configured proxies bcode genuinely doesn't know what's downstream; defaulting to "strip everything" is the wrong direction. "Forward and let upstream answer" is honest.
  4. Diff is 1 line + comment.

Risk

Other openai-compatible users who point bcode at a non-vision endpoint will now receive an upstream 4xx instead of a local stripped-with-fake-error. Strictly an error-message-quality regression for them; the model never received the image in either case.


Summary by cubic

Forward media parts for @ai-sdk/openai-compatible instead of replacing them with fake "ERROR: Cannot read…" text. This fixes silent stripping of images/files and lets upstream return real, debuggable errors when a modality isn’t supported.

  • Bug Fixes
    • Bypass unsupportedParts for @ai-sdk/openai-compatible; media is forwarded as-is.
    • Keep capability filtering for native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) where models.dev is authoritative.

Written for commit cbd37a8. Summary will update on new commits. Review in cubic

…t inject fake ERROR text

unsupportedParts (transform.ts) replaces image/file parts with an inline 'ERROR: Cannot read <name> (this model does not support <modality> input)' text part when model.capabilities.input.<modality> is false. The capability is derived in provider.ts:1298-1304 — for @ai-sdk/openai-compatible (the user-configured proxy) there is no models.dev entry, so image/audio/video/pdf all default to false unless the user explicitly declares modalities in opencode.json.

This is a silent footgun in the user-configured-proxy case. The screenshot is stripped before the wire, and the model receives the fabricated 'ERROR: Cannot read image' text as if it came from the user — producing nonsensical replies like 'I can't see the image' even though upstream supports vision.

Forwarding the part for openai-compatible providers is honest: if upstream truly can't handle it, we get a real provider error from the API call, which is far more debuggable than fabricated capability text. Native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) keep the existing check because models.dev IS authoritative for them — there the filter prevents a real 4xx upstream and gives the user a clear local error.

Reproed against browser-use cloud's V4 LLM gateway (an openai-compatible proxy in front of Anthropic): vision smoketest replied 'I cannot see the image' on every run; gateway-side request audit confirmed zero image_url / file / data:image/ signals in the body even when the agent attached a screenshot via browser_execute. With this change image parts reach the wire and the model produces real vision replies.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@Alezander9 Alezander9 merged commit 8f9db17 into main May 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant